Skip to content

Add force resume support for cross-cluster replication#1685

Open
mohit10011999 wants to merge 19 commits into
opensearch-project:mainfrom
mohit10011999:ForceResume
Open

Add force resume support for cross-cluster replication#1685
mohit10011999 wants to merge 19 commits into
opensearch-project:mainfrom
mohit10011999:ForceResume

Conversation

@mohit10011999
Copy link
Copy Markdown
Contributor

@mohit10011999 mohit10011999 commented May 9, 2026

Description

When retention leases expire on the leader cluster (e.g., due to prolonged pause or leader translog rollover), the normal resume API fails because it cannot catch up from where it left off. This PR adds a force_resume option that triggers a snapshot-based bootstrap to recover replication without requiring a full stop + restart cycle.

Solution

Added force_resume=true parameter to the resume replication API:

POST /_plugins/_replication/<follower-index>/_resume
{ "force_resume": true }

When force_resume=true and retention leases are missing, the ForceResumeCoordinator orchestrates:

Remove index block - unblocks the follower index (same as stop action)
Acquire retention leases at leaderGlobalCheckpoint + 1 per shard - prevents the race condition where the leader's translog is purged during the async snapshot restore
Delete follower index - triggers the IndexReplicationTask state machine to re-bootstrap from snapshot via setupAndStartRestore()
On failure at any step, cleanup logic removes partially acquired leases and re-adds the index block

Backward Compatibility

The force_resume field defaults to false, so existing clients calling _resume with an empty body or without the field see no behavior change
Wire serialization is additive (new boolean field appended)

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

mohit10011999 and others added 6 commits May 10, 2026 14:14
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
- Removed reimplemented block removal/addition (reuses UpdateIndexBlockAction, same as stop action)
- Removed reimplemented index deletion (reuses same pattern as IndexReplicationTask.cancelRestore)
- Removed reimplemented lease cleanup (reuses existing RemoteClusterRetentionLeaseHelper methods)
- Removed ReplicationMetadataManager dependency from coordinator (not needed)
- Simplified getLeaderGlobalCheckpoint using firstOrNull (same API as RemoteClusterRepository)
- Only genuinely new logic retained: pre-restore lease acquisition at leaderCheckpoint+1

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

PR Reviewer Guide 🔍

(Review updated until commit fdf3669)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Possible Issue

If the follower index is deleted between the routing table lookup (line 111-112) and the metadata lookup during cleanup (line 181-182), the cleanup logic logs a warning but continues without removing the retention leases on the leader. This leaves orphaned leases that can accumulate over time. The scenario occurs when the index is deleted externally after force resume starts but before cleanup runs.

val indexMetadata = clusterService.state().metadata.index(params.followerIndexName)
if (indexMetadata == null) {
    log.warn("Follower index ${params.followerIndexName} not found in metadata during cleanup. " +
            "Retention leases on leader may be orphaned for shard $shardIdInt.")
    continue
}
Race Condition

Between removing the index block (line 62) and acquiring retention leases (line 68), the follower index is unblocked and writable. If writes occur during this window, they will be lost when the index is deleted (line 71). The coordinator should either keep the block until after deletion or document this data loss risk.

removeIndexBlock(followerIndex)

// From this point, any failure must clean up leases and re-add the block.
try {
    // Step 2: Acquire retention leases on leader BEFORE deleting follower.
    // This is the only genuinely new logic — prevents the race condition.
    acquirePreRestoreLeases(params, acquiredLeases)

    // Step 3: Delete follower index — same pattern as IndexReplicationTask.cancelRestore()
    deleteFollowerIndex(followerIndex)
Logic Error

When forceResume=true and leases exist (line 117-118), the code proceeds with normal resume but the comment claims this is intentional. However, the user explicitly requested force resume (snapshot bootstrap), so proceeding with normal resume contradicts the request. Either reject the request or honor the force resume intent.

// For force resume with valid leases, proceed with normal resume
// (forceResume=true but leases exist → no-op, just resume normally)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 10, 2026

PR Code Suggestions ✨

Latest suggestions up to fdf3669

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Track lease only after successful renewal

The acquiredLeases map is updated even when renewRetentionLease is called, but
renewal can fail silently or throw an exception. If renewal fails, the lease may not
be at the expected sequence number, yet cleanup logic assumes it was successfully
acquired at retainingSeqNo.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [127-133]

 } catch (e: RetentionLeaseAlreadyExistsException) {
     retentionLeaseHelper.renewRetentionLease(
         leaderShardId, retainingSeqNo, followerShardId
     )
+    acquiredLeases[followerShardId.id] = retainingSeqNo
 }
-acquiredLeases[followerShardId.id] = retainingSeqNo
Suggestion importance[1-10]: 8

__

Why: This is a valid concern. The acquiredLeases map is updated even when renewRetentionLease is called in the catch block, but if renewal fails with a different exception, the lease tracking will be incorrect. Moving the map update inside the catch block ensures it only tracks successfully renewed leases.

Medium
Throw exception when routing table missing

Silently skipping lease acquisition when routing table is missing can lead to data
loss. If the follower index exists but has no routing table, proceeding without
leases means the subsequent index deletion and restore will fail to protect against
translog purging on the leader.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [111-116]

 val shards = clusterService.state().routingTable
     .indicesRouting().get(params.followerIndexName)?.shards()
 if (shards == null) {
-    log.warn("No routing table found for follower index ${params.followerIndexName}, skipping lease acquisition")
-    return
+    throw IllegalStateException("No routing table found for follower index ${params.followerIndexName}. Cannot proceed with force resume.")
 }
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that silently skipping lease acquisition when the routing table is missing could lead to issues. However, the current code logs a warning and returns early, which may be intentional for edge cases. Throwing an exception is safer but may be overly strict if the index is being deleted concurrently.

Medium
Ensure lease cleanup even if index deleted

If the follower index was already deleted before cleanup runs (e.g., deletion
succeeded but a subsequent step failed), indexMetadata will be null for all shards.
This causes all lease removals to be skipped, leaving orphaned leases on the leader
cluster that will never be cleaned up.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [180-189]

-private suspend fun cleanupOnFailure(
-    params: IndexReplicationParams,
-    acquiredLeases: Map<Int, Long>,
-    followerIndex: String
-) {
-    // Clean up partially acquired leases — uses existing attemptRetentionLeaseRemoval()
-    if (acquiredLeases.isNotEmpty()) {
-        try {
-            val remoteClient = client.getRemoteClusterClient(params.leaderAlias)
-            ...
-            for (shardIdInt in acquiredLeases.keys) {
-                val indexMetadata = clusterService.state().metadata.index(params.followerIndexName)
-                if (indexMetadata == null) {
-                    log.warn("Follower index ${params.followerIndexName} not found in metadata during cleanup. " +
-                            "Retention leases on leader may be orphaned for shard $shardIdInt.")
-                    continue
-                }
+for (shardIdInt in acquiredLeases.keys) {
+    val indexMetadata = clusterService.state().metadata.index(params.followerIndexName)
+    val followerShardId = if (indexMetadata != null) {
+        ShardId(indexMetadata.index, shardIdInt)
+    } else {
+        ShardId(Index(params.followerIndexName, "_na_"), shardIdInt)
+    }
+    retentionLeaseHelper.attemptRetentionLeaseRemoval(ShardId(params.leaderIndex, shardIdInt), followerShardId)
+}
Suggestion importance[1-10]: 7

__

Why: The suggestion addresses a real issue where if the follower index is deleted before cleanup, indexMetadata will be null and lease cleanup will be skipped. The proposed solution of using a placeholder UUID ("_na_") allows cleanup to proceed, though it's somewhat hacky. A better approach might be to store the original index UUID before deletion.

Medium
General
Add explicit error handling for coordinator

If executeForceResume throws an exception, the index block has been removed but not
restored, leaving the follower index in an inconsistent state. The exception
propagates without re-adding the block, potentially allowing writes to a partially
configured index.

src/main/kotlin/org/opensearch/replication/action/resume/TransportResumeIndexReplicationAction.kt [103-115]

 if (!resumable && request.forceResume) {
-    // Delegate to force resume coordinator (snapshot bootstrap path)
     log.info("Retention lease expired for ${request.indexName}. " +
             "Force resume requested — initiating snapshot bootstrap.")
     val coordinator = ForceResumeCoordinator(client, clusterService)
-    val result = coordinator.executeForceResume(params)
-    log.info("Force resume coordinator completed for ${request.indexName}: " +
-            "leases acquired at ${result.leaseAcquiredAtSeqNo}, " +
-            "duration=${result.durationMillis}ms")
-    // After coordinator completes, fall through to start IndexReplicationTask.
-    // Since the follower index was deleted, isResumed() returns false,
-    // which triggers setupAndStartRestore() in the state machine.
+    try {
+        val result = coordinator.executeForceResume(params)
+        log.info("Force resume coordinator completed for ${request.indexName}: " +
+                "leases acquired at ${result.leaseAcquiredAtSeqNo}, " +
+                "duration=${result.durationMillis}ms")
+    } catch (e: Exception) {
+        log.error("Force resume failed for ${request.indexName}, coordinator already performed cleanup", e)
+        throw e
+    }
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds a try-catch block that doesn't change behavior—it just logs and re-throws the exception. The coordinator already handles cleanup internally in its own try-catch block (lines 72-76 in ForceResumeCoordinator.kt), so this additional error handling is redundant and only adds logging without improving error recovery.

Low

Previous suggestions

Suggestions up to commit a23f7a8
CategorySuggestion                                                                                                                                    Impact
Possible issue
Track leases only after successful acquisition

The acquiredLeases map is updated even when renewRetentionLease is called, but if
the renew operation fails (throws an exception), the lease won't actually be
acquired yet the map will be updated. This can cause incorrect cleanup logic. Move
the map update inside a try-catch or ensure it only happens after successful lease
operations.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [127-133]

+try {
+    retentionLeaseHelper.addRetentionLease(
+        leaderShardId, retainingSeqNo, followerShardId
+    )
+    acquiredLeases[followerShardId.id] = retainingSeqNo
 } catch (e: RetentionLeaseAlreadyExistsException) {
     retentionLeaseHelper.renewRetentionLease(
         leaderShardId, retainingSeqNo, followerShardId
     )
+    acquiredLeases[followerShardId.id] = retainingSeqNo
 }
-acquiredLeases[followerShardId.id] = retainingSeqNo
Suggestion importance[1-10]: 8

__

Why: This is a valid bug fix. If renewRetentionLease throws an exception, the acquiredLeases map would not be updated, but the current code structure places the map update outside the catch block, meaning it executes regardless. The improved code ensures the map is only updated after successful lease operations, preventing incorrect cleanup logic.

Medium
Fail fast on missing routing table

Silently skipping lease acquisition when routing table is missing can lead to
incomplete force resume. If the follower index exists but has no routing table, the
subsequent delete operation will succeed but no leases will be acquired, causing
replication to fail when resumed. Consider throwing an exception instead to fail
fast and trigger cleanup.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [111-116]

 val shards = clusterService.state().routingTable
     .indicesRouting().get(params.followerIndexName)?.shards()
-if (shards == null) {
-    log.warn("No routing table found for follower index ${params.followerIndexName}, skipping lease acquisition")
-    return
-}
+    ?: throw IllegalStateException("No routing table found for follower index ${params.followerIndexName}")
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that silently skipping lease acquisition when routing table is missing could lead to incomplete force resume. Throwing an exception would trigger cleanup and fail fast, which is more appropriate than proceeding without leases. However, the current behavior may be intentional for edge cases where the index is being deleted.

Medium
General
Optimize metadata lookup in cleanup loop

The cleanup logic retrieves indexMetadata inside the loop for every shard, but if
the follower index was already deleted (step 3 succeeded before failure), this
metadata lookup will fail for all shards. Retrieve the metadata once before the loop
and handle the null case appropriately to avoid redundant warnings and improve
efficiency.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [178-187]

+val indexMetadata = clusterService.state().metadata.index(params.followerIndexName)
+if (indexMetadata == null) {
+    log.warn("Follower index ${params.followerIndexName} not found in metadata during cleanup. " +
+            "Retention leases on leader may be orphaned.")
+    return
+}
 for (shardIdInt in acquiredLeases.keys) {
-    val indexMetadata = clusterService.state().metadata.index(params.followerIndexName)
-    if (indexMetadata == null) {
-        log.warn("Follower index ${params.followerIndexName} not found in metadata during cleanup. " +
-                "Retention leases on leader may be orphaned for shard $shardIdInt.")
-        continue
-    }
     val followerShardId = ShardId(indexMetadata.index, shardIdInt)
     retentionLeaseHelper.attemptRetentionLeaseRemoval(ShardId(params.leaderIndex, shardIdInt), followerShardId)
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion improves efficiency by moving the indexMetadata lookup outside the loop and reduces redundant warning messages. However, the improvement is minor since cleanup is a rare failure path, and the current approach provides per-shard diagnostics which could be useful for debugging.

Low
Suggestions up to commit 5017605
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle missing routing table properly

Early return when shards are null may leave the index in an inconsistent state with
the block removed but no leases acquired. Consider throwing an exception instead to
trigger cleanup and re-add the block.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [111-113]

 val shards = clusterService.state().routingTable
     .indicesRouting().get(params.followerIndexName)?.shards()
-    ?: return
+    ?: throw IllegalStateException("No routing table found for follower index ${params.followerIndexName}")
Suggestion importance[1-10]: 8

__

Why: The early return when shards is null leaves the index in an inconsistent state with the block removed but no leases acquired. Throwing an exception instead properly triggers the cleanup logic in the catch block to re-add the block and maintain consistency.

Medium
Add explicit null check for seqNoStats

The seqNoStats field may be null even when the shard is found, leading to a
NullPointerException before the exception is thrown. Add an explicit null check for
seqNoStats to provide a clearer error message.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [141-144]

-return statsResponse.shards
+val shard = statsResponse.shards
     .firstOrNull { it.shardRouting.shardId().id == shardId && it.shardRouting.primary() }
-    ?.seqNoStats?.globalCheckpoint
-    ?: throw IllegalStateException("Primary shard $shardId not found or seqNoStats unavailable for leader index $leaderIndexName")
+    ?: throw IllegalStateException("Primary shard $shardId not found for leader index $leaderIndexName")
+return shard.seqNoStats?.globalCheckpoint
+    ?: throw IllegalStateException("seqNoStats unavailable for primary shard $shardId in leader index $leaderIndexName")
Suggestion importance[1-10]: 7

__

Why: The suggestion improves error handling by separating the null checks for the shard and seqNoStats, providing clearer error messages. However, the original code already handles the null case via the elvis operator, so this is more of a code clarity improvement than a critical bug fix.

Medium
General
Refresh state after index deletion

After force resume deletes the follower index, the subsequent code may attempt to
access the deleted index metadata, causing failures. Verify that the index
recreation or state machine initialization properly handles the deleted index state
before proceeding.

src/main/kotlin/org/opensearch/replication/action/resume/TransportResumeIndexReplicationAction.kt [103-115]

 if (!resumable && request.forceResume) {
-    // Delegate to force resume coordinator (snapshot bootstrap path)
-    log.info("Retention lease expired for ${request.indexName}. " +
-            "Force resume requested — initiating snapshot bootstrap.")
+    log.info("Retention lease expired for ${request.indexName}. Force resume requested — initiating snapshot bootstrap.")
     val coordinator = ForceResumeCoordinator(client, clusterService)
     val result = coordinator.executeForceResume(params)
-    log.info("Force resume coordinator completed for ${request.indexName}: " +
-            "leases acquired at ${result.leaseAcquiredAtSeqNo}, " +
-            "duration=${result.durationMillis}ms")
-    // After coordinator completes, fall through to start IndexReplicationTask.
-    // Since the follower index was deleted, isResumed() returns false,
-    // which triggers setupAndStartRestore() in the state machine.
+    log.info("Force resume coordinator completed for ${request.indexName}: leases acquired at ${result.leaseAcquiredAtSeqNo}, duration=${result.durationMillis}ms")
+    // Refresh cluster state after index deletion to ensure subsequent operations see the updated state
+    clusterService.state()
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion to call clusterService.state() after index deletion doesn't actually refresh the cluster state—it just retrieves the current state. The cluster state is automatically updated by OpenSearch when the index is deleted, and the subsequent code already handles the deleted index state properly through the isResumed() check mentioned in the comments.

Low
Suggestions up to commit a8e76d6
CategorySuggestion                                                                                                                                    Impact
Possible issue
Handle null seqNoStats gracefully

The method assumes seqNoStats is non-null, but it could be null if the shard hasn't
been fully initialized. Add a null check for seqNoStats to prevent potential
NullPointerException and throw a more descriptive error.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [141-144]

-return statsResponse.shards
+val shardStats = statsResponse.shards
     .firstOrNull { it.shardRouting.shardId().id == shardId && it.shardRouting.primary() }
-    ?.seqNoStats?.globalCheckpoint
     ?: throw IllegalStateException("Primary shard $shardId not found for leader index $leaderIndexName")
+return shardStats.seqNoStats?.globalCheckpoint
+    ?: throw IllegalStateException("SeqNoStats not available for primary shard $shardId in leader index $leaderIndexName")
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that seqNoStats could be null for uninitialized shards. Adding an explicit null check with a descriptive error message improves robustness and error reporting, though this is a defensive programming improvement rather than a critical bug fix.

Medium
General
Log warning when shards are null

The early return when shards are null silently skips lease acquisition without
logging or validation. This could mask configuration issues where the follower index
exists but has no routing entries. Add a warning log before returning to aid
debugging.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [111-113]

 val shards = clusterService.state().routingTable
     .indicesRouting().get(params.followerIndexName)?.shards()
-    ?: return
+if (shards == null) {
+    log.warn("No routing entries found for follower index ${params.followerIndexName}, skipping lease acquisition")
+    return
+}
Suggestion importance[1-10]: 5

__

Why: Adding a warning log when shards is null improves debuggability by making it explicit when lease acquisition is skipped. However, this is a minor enhancement since the early return is already a valid handling strategy for this edge case.

Low
Log when metadata unavailable during cleanup

The cleanup loop silently skips shards where metadata is unavailable using continue.
If the follower index was already deleted, all cleanup attempts will be skipped
without logging. Add a log statement when metadata is missing to track cleanup
behavior.

src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt [175-181]

 for (shardIdInt in acquiredLeases.keys) {
-    val followerShardId = ShardId(
-        clusterService.state().metadata.index(params.followerIndexName)?.index ?: continue,
-        shardIdInt
-    )
+    val followerIndexMetadata = clusterService.state().metadata.index(params.followerIndexName)?.index
+    if (followerIndexMetadata == null) {
+        log.debug("Follower index ${params.followerIndexName} metadata not found during cleanup for shard $shardIdInt")
+        continue
+    }
+    val followerShardId = ShardId(followerIndexMetadata, shardIdInt)
     retentionLeaseHelper.attemptRetentionLeaseRemoval(ShardId(params.leaderIndex, shardIdInt), followerShardId)
 }
Suggestion importance[1-10]: 4

__

Why: Adding debug logging when follower index metadata is missing during cleanup aids troubleshooting. However, this is a low-impact improvement since the continue statement already handles this case appropriately, and the scenario is expected when the index has been deleted.

Low
Add error handling for coordinator failures

After force resume deletes the follower index, the code falls through to start
replication. However, if getLeaderIndexMetadata or subsequent operations fail, the
follower index remains deleted with no block protection. Wrap the fall-through logic
in a try-catch to handle failures gracefully.

src/main/kotlin/org/opensearch/replication/action/resume/TransportResumeIndexReplicationAction.kt [103-115]

 if (!resumable && request.forceResume) {
-    // Delegate to force resume coordinator (snapshot bootstrap path)
-    log.info("Retention lease expired for ${request.indexName}. " +
-            "Force resume requested — initiating snapshot bootstrap.")
+    log.info("Retention lease expired for ${request.indexName}. Force resume requested — initiating snapshot bootstrap.")
     val coordinator = ForceResumeCoordinator(client, clusterService)
-    val result = coordinator.executeForceResume(params)
-    log.info("Force resume coordinator completed for ${request.indexName}: " +
-            "leases acquired at ${result.leaseAcquiredAtSeqNo}, " +
-            "duration=${result.durationMillis}ms")
-    // After coordinator completes, fall through to start IndexReplicationTask.
-    // Since the follower index was deleted, isResumed() returns false,
-    // which triggers setupAndStartRestore() in the state machine.
+    try {
+        val result = coordinator.executeForceResume(params)
+        log.info("Force resume coordinator completed for ${request.indexName}: leases acquired at ${result.leaseAcquiredAtSeqNo}, duration=${result.durationMillis}ms")
+    } catch (e: Exception) {
+        log.error("Force resume coordinator failed for ${request.indexName}", e)
+        throw e
+    }
 }
Suggestion importance[1-10]: 3

__

Why: The suggested try-catch block is redundant because the coordinator's executeForceResume method already throws exceptions on failure (line 75 in ForceResumeCoordinator.kt), which will propagate naturally. The additional logging doesn't add significant value beyond what already exists.

Low

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit 5017605

@mohit10011999
Copy link
Copy Markdown
Contributor Author

Regarding #1685 (comment)
and #1685 (comment) AI comment

Regarding Race Condition comment:-
No - retention leases prevent purging by design. Once acquired at seqNo X, operations ≥ X are retained regardless of global checkpoint advances.

Incomplete Cleanup
Added a log warning + continue instead of silent skip

Logic Error
No - the bot misread the condition. if (!resumable && request.forceResume) means the coordinator only runs when leases are missing. If leases exist, it's skipped.

Null seqNoStats
Improved error message

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@github-actions
Copy link
Copy Markdown

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit a23f7a8.

PathLineSeverityDescription
src/main/kotlin/org/opensearch/replication/action/resume/ForceResumeCoordinator.kt136mediumIndicesStatsRequest().all() fetches all shard statistics from the remote leader cluster rather than requesting only sequence number stats. This unnecessarily exposes broader cluster state information across the cluster boundary. While not clearly malicious, it is anomalous given that only globalCheckpoint is consumed from the response.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 1 | Low: 0


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit a23f7a8

… boundary

Signed-off-by: Mohit Kumar <mohitamg@amazon.com>
@mohit10011999
Copy link
Copy Markdown
Contributor Author

#1685 (comment)
Addressed as below:-

Replaced .all() with .clear() followed by .translog(true) — this tells the leader to only compute translog-related stats, dramatically reducing the payload sent across the cluster boundary.
seqNoStats is always available at the shard level regardless of flags, so the globalCheckpoint read still works correctly.

@github-actions
Copy link
Copy Markdown

Persistent review updated to latest commit fdf3669

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants